Eszter
Friedman, MTA SZTAKI, feszter@info.ilab.sztaki.hu
Julianna Göbölös-Szabó, MTA
SZTAKI, gobolos.szabo.julianna@gmail.com
Adrien Szabó, MTA SZTAKI,
adrienn.szabo4@gmail.com, [PRIMARY
contact]
András Lukács, MTA SZTAKI,
alukacs@sztaki.hu
We have used two separate tools for
solving the task; both were developed by the Data Mining and Web Search
Group, Computer and Automation Research Institute,
The search and visualization tool PinWallVis was designed to explore and understand
large data sets represented by networks; it was also developed for other projects at
our research group at MTA SZTAKI. PinWallVis
was further tweaked to serve VAST 2010 Challenge properly. With different
algorithmic engines PinWallVis is able not
just to visualize the graph using several different layouts but also to provide
several further opportunities to work with the graph such as searching in the
entities originated from the data, adding extra edges, finding the shortest
paths between nodes.
Video:
ANSWERS:
MC1.1:
Summarize the activities that happened in each country with respect to illegal
arms deals based on a synthesis of the information from the different report
types and sources. State the situation in each country at the end of the
period (i.e. the end of the information you have been given) with respect to
illegal arms deals being pursued. Present a hypothesis about the next
activities you expect to take place, with respect to the people, groups, and
countries.
To visualize information extracted from the text with PinWallVis we have defined two graphs based on the
texts: the source graph and a social (and location) network graph. Suppose that
the text can be split into smaller parts, reports. The source graph of the
texts is a graph whose nodes correspond to the entities occurring in the texts
and the reports. Edges can be present only between entities and reports. There
is an edge between report R and an entity E if and only if the entity E
is present in the report R.
First we used the named entity recognition visualization software to
define the nodes and edges of the source graph. On
illustration 1. the input text with the
recognized and highlighted entities can be seen on the right, while the same
entities sorted into tables are on the left.
Fig. 1: The NERVis tool
After defining the entities the graph could be loaded into the PinWallVis.
Fig. 2: PinWallVis: searching for
From the search hit list we can select which entities we want to draw
onto the screen. When clicking to a cell on screen its neighbors pop up. When
more information is needed about a source the full text is shown when right click on the cell. (See on Illustration 4 right bottom
side.) That way we could sort the needed data to acquire information asked for.
It also helped that different layouts were developed, such as a special DateBasedSpringLayout, where cells which include a
date field (such as almost every report’s title, and the recognized entities of
date type) are arranged ordered along the x-axis such that earlier events are
closer to the left side of the screen
Fig. 3: PinWallVis: Entities containing the work
Fig. 4: PinWallVis: After clicking to all sources, all
entities that were mentioned together with
We used PinWallVis to find the reports mentioning
each county and organized the reports according to their temporal order. The series of events and
recent situations could be built up for each county. By checking further reports and entities the story could be completed. The length of this
process was somewhat longer than reading one time all reports (some reports
connected to more than one country). The result is the following:
In the second half of 2008 at least a transaction of arms between Lim Chanarong having position in a rebel fraction of Burma
and Nicolai Kuryakin’s group
of Moscow with the mediation by Thai arms dealer Boonmee
Khemkhaeng was completed.
The cargo ship MV Tanya captured by pirates and later released for USD 3.2M was
supposedly heading to
Gaza/Lebanon: Martyrs Front of Judea plans to increased
activity in May 2009, and looking for arms from multiple sources. Muhamed Kashem, a leader of MFJ and
others organized a meeting to perform possible arms deal of Russian
origin in
Since several prominent people in connection with illegal arm dealing
died and some were arrested, new participants are expected to take part.
MC1.2: Illustrate the associations among the players in the arms
dealing through a social network. If there are linkages among countries, please
highlight these as well in the social network. Our analysts are
interested in seeing different views of the social network that might help them
in counterintelligence activities (people, places, activities, communication
patterns that are key to the network).
Two graphs were defined to analyze and visualize information gained from
the input text by PinWallVis. The source graph
was defined earlier. The social network is a graph whose nodes are
persons appearing in any of the sources. Two nodes (persons) are connected if
there is at least one source where they are both mentioned. The same way we
have built a location network and a location – person network. In
these graphs entities of the right type correspond to the nodes, and two
entities are connected if they mentioned somewhere together.
The different layouts of PinWallVis
help to analyze the data. The layout which makes the graph most apprehensible
depends on the structure of the data and type of information the user want to
gather. Small amount of data can be transparent with a circular layout. Often
the most helpful layout is spring layout which is a force-based algorithm. The
force-based algorithms purpose is to position the nodes of a graph in two
dimensional or three dimensional space so that all the
edges are of more or less equal length and there are as few crossing edges as
possible.
We can easily gain extra information from the graph not just by looking
at it, but also by using the menu. The nodes with the highest degrees can be
listed. Supposing that a central person of the arm dealing social network is
mentioned often with separate people, the node corresponding the given person
ought to have a high degree. (Note that if a relatively large complete subgraph of the graph exists, it means that all persons
corresponding two the nodes in the subgraph were
mentioned together in a report. The fact that a report has listed several names
does not mean that those people are central people in anything.)
Fig. 5: PinWallVis: The Minimap
option is switched on so the graph can be seen as a whole with the part which
is visible on the screen being marked.
On the above picture the social network graph can be seen. On the Minimap all cluster of the network can be noticed easily. On the left
side of the screen a complete subgraph can be seen which stands for the people who took part in
the civil disturbance in
We also have an option of adding an extra edge to the graph on screen.
So if we want to add an edge between George Ngoki and
Engr. Funsho Kapolalum (since they use the same email address as it can
be seen easily from the source graph) it can be done easily.
Fig. 6: The graph of
social and location network extracted from all text visualized with PinWallVis.
On the second picture
the location and social network can be seen. In the main panel’s centre the
cluster of the people in connection with
Fig. 7:
Loading information in connection with
After analysis of the graphs the connection can be easily spotted, such
as in Fig. 8. The neighbors of